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ABSTRACT 

Aims. We outline the Bayesian approach to inferring /nl, the level of non-Gaussianities of local type. Phrasing /nl 
inference in a Bayesian framework takes advantage of existing techniques to account for instrumental effects and 
foreground contamination in CMB data and takes into account uncertainties in the cosmological parameters in an 
unambiguous way. 

Methods. We derive closed form expressions for the joint posterior of /nl and the reconstructed underlying curvature 
perturbation, <E>, and deduce the conditional probability densities for /nl and $. Completing the inference problem 
amounts to finding the marginal density for /nl- For realistic data sets the necessary integrations are intractable. 
We propose an exact Hamiltonian sampling algorithm to generate correlated samples from the /nl posterior. For 
sufficiently high signal-to-noise ratios, we can exploit the assumption of weak non-Gaussianity to find a direct Monte 
Carlo technique to generate independent samples from the posterior distribution for /nl- We illustrate our approach 
using a simplified toy model of CMB data for the simple case of a 1-D sky. 

Results. When applied to our toy problem, we find that, in the limit of high signal-to-noise, the sampling efficiency of 
the approximate algorithm outperforms that of Hamiltonian sampling by two orders of magnitude. When /nl is not 
significantly constrained by the data, the more efficient, approximate algorithm biases the posterior density towards 
/nl = 0. 

Key words, cosmic microwave background - cosmological parameters - Methods: data analysis - Methods: numerical - 
Methods: statistical 

1. Introduction 



The analysis of cosmic microwave background (CMB) radiation data has considerably improved our understanding 
of cosmology and played a crucial role i n constraining the set of fundamental cosmological parameters of the universe 
(jSpergel et al.ll2007tlHinshaw et al.ll2009ll . This success is based on the intimate link between the temperature fluctuations 
' we observe today and the physical processes taking place in the very early univer se. Inflation is currently the favored 
theory predicting the shape of primordial perturbations (|Guthl Il98ll iLindd Il982), which in its canonical form leads 
to very small non-Gaussian ities that are far from being detectable by means of present-day experiments (Maldaccna 
2003; Ac auaviva et al.ll2003f ). However, inflation scenarios producing larger amounts of non-Gaussianity can naturally be 
constructed by breaking one or more of the following pro perties of canonical inflation: slow- roll, single-field, Bunch-Davies 
vacuum, or a canonical kinetic term (jBartolo et al. 2004). Thus, a positive detection of primordial non-Gaussianity would 
' allow us to rule out the simplest models. Combined with improving constraints on the scalar spectral index n s , the test 
for non-Gaussianity is therefore complementary to the search for gravitational waves as a means to test the physics of 
the early Universe. 

A common strategy for e stimating primordial non-Gaussianity is to examine a cubic combinations of filtered CMB sky 
maps (jKomatsu et al.ll2005D . This approach takes advantage of the specific bispectrum signatures produced by primordial 
non-Gaussianity and yi elds to a computationa lly efficient algorithm. When combined with the variance reduction tech- 
nique first described bv lCreminelli et al.l (|2006f) these bispectrum-based techniques are close to optimal, where optimality 
is defined as saturation of the Cramer-Rao bound. Latel y, a more computa tionally costly minimum variance estimator 
has been implemented and applied to the WMAP5 data (jSmith et al.ll2009h . 

Recently, a Bayesian approach has been introduced in CMB power spectrum analysis an d applied successfully to 
WMAP data making use of Gibbs-sampling techniques ([ Jewell et alj|2004t IWandclt et al. 2004). Within this framework, 
one draws samples from the posterior probability density given the data without explicitly calculating it. The target 
probability distribution is finally constructed out of the samples directly, thus computationally costly evaluations of 
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the likelihood function or its derivatives are not necessary. Another advantage of the Bayesian analysis is that the 
method naturally offers the poss ibility to include a c onsistent treatment of the uncertainties associated with foreground 
emission or instrumental effects (|Eriksen et al.ll2008T ) . As it is possible to model CMB and foregrounds jointly, statistical 
interdependencies can be directly factored into the calculations. This is not straightforward in the frequentist approach 
where the data analysis is usually performed in consecutive steps. Yet another important and desirable feature is the 
fact that a Bayesian analysis obviates the necessity to specify fiducial parameters, whereas in the frequentist approach 
it is only possible to test one individual null hypothesis at a time. 

In this paper, we pursue the modest goal of developing the formalism for the extension of the Bayesian approach to 
the analysis of non-Gaussian signals, in particular to local models, where the primordial perturbations can be modeled 
as a spatially local, non-linear transformation of a Gaussian random field. Utilizing this method, we are able to write 
down the full posterior probability density function (PDF) of the level of non-Gaussianity. We demonstrate the principal 
aspects of our approach using a 1-D toy sky model. Although we draw our discussion on the example of CMB data 
analysis, the formalism presented here is of general validity and may also be applied within a different context. 

The paper is organized as follows. In Sect. [5]we give a short overview of the theoretical background used to characterize 
primordial perturbations. We present a new approximative approach to extract the amplitude of non-Gaussianities from 
a map in Sect.[3]and verify the method by means of a simple synthetic data model (Sect. HI . We compare the performance 
of our technique to an exact Hamiltonian Monte Carlo sampler which we developer in Sect. |5]and discuss the extensions 
of the model required to deal with a realistic CMB sky map (Sect. [7|). Finally, we summarize our results in Sect. [H 



2. Model of non-Gaussianity 

The expansion coefficients ag m of the observed CMB temperature anisotropies in harmonic space can be related to the 
primordial fluctuations via 

a e™ = ^J k 2 dkr 2 dr[^ em (r)gf t (k) + S em (r)gY (k)}j e (kr)+n em: (I) 

where $>e m {i") and Sg m (r) are the primordial curvature and isocurvature perturbations at comoving distance r, g" dl (fc) 
and g\ so {k) their corresponding transfer functions in momentum space. The spherical Bessel function of order i is denoted 
by ji(kr), bi includes beam sme aring effects, and ri£ m describes instrumental noise. As curvature perturbations dominate 
over isocurvature perturbations (jBean et al.ll2006l : lTrottall2007f ). we will neglect the contribution of Sg m in our subsequent 
analysis. 

Any non-Gaussian signature imprinted in the primordial perturbations will be transferred to the ai m according to Eq.[T] 
and is therefore detectable, in principle. Theoretical models pre dicting significant l evels of non-Gaussian contributions 
to the observed signal can be subdivided into two broad classes ([Babich et al.l 12004): one producing non-Gaussianity of 
local type, the other of equilateral type. The former kind of non-Gaussianity is ach i eved to very good approximation 
in mu lti-field inflation as described by the cur vaton model (iMoroi fc Takahashill2001l; lEnqvist fe S loth 20021 iLvth et al.l 
or in cyclic/ekpyrotic universe models (|Khourv et al.ll2001t ISteinhardt fc Turokll2002[ ). The latter type of non- 
Gaussianity is typically a result of sin gle field models with non-minimal Lagrangian including higher order derivatives 
(|Alishahiha et al.l[2^0llSenatorel[2C505l) . 

Concentrating on local models, we can parametrize the non-Gaussianity of $ by introducing an a dditional quadratic 
dependence on a purely Gaussian auxiliary field $l, that is local in real space, of the form (jVerde et al.l [2000; 
iKomatsu fe Spergell[200l 

*nlW = $ L (r) + / N iM(r) - <*£(r)>] , (2) 
where /nl is a dimensionless measure of the amplitude of non-Gaussianity. 



3. Bayesian inference of non-Gaussianity 

It has been shown to be fe asible to reconstruct the primordial curvature p otential out of temperature or temperature 
and polarization sky maps (|Yadav fc WandeTtll2005t lElsner fc~W andclt 2009.) , which allows searching for primordial non- 
Gaussianities more sensitively. Although the mapping from a 3D potential to a 2D CMB sky map is not invertiblc 
unambiguously, a unique solution can be found by requiring that the result minimizes the variance. In this conventional 
frequentist approa ch, the level of non-G aussianity and an estimate of its error is derived from a cubic combination of 
filtered sky maps (jKomatsu et al.l 12005). We will show in the following sections how to sample /nl from the data and 
unveil the full posterior PDF using a Bayesian approach. 

3.1. Joint probability distribution 

In our analysis we assume the data vector d to be a superposition of the CMB signal s and additive noise n 



d = Bs + n 
= BM $ + n , 



(3) 
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where information about observing strategy and the optical system are encoded in a pointing matrix B and M is a linear 
transformation matrix. In harmonic space, the signal is related to the primordial scalar perturbation as 

s tm = - I k 2 dkr 2 dr<Z> em (r)gf l (k)j e (kr) 



i 

= M$, m . (4) 

Our aim is to construct the posterior PDF of the amplitude of non-Gaussianities given the data, P(/nl|^). To do so, 
we subsume the remaining set of cosmological parameters to a vector 9 and calculate the joint distribution as 

P(d, $l, /nl, 0) = P(d|$ L , /nl, 9)P($ h \6)P(6)P(fa h ) . (5) 

Now, we can use Eq. [2] to express the probability for data d given $l, /nl, and 9 

P(d|$ L , / NL , 9) = 1 c ~ 1/2 [d-BM(<S>i J +f N L(1>l-{'fl)))] i N- 1 ld-BM (# L +/ NL ($J-($= )))] ^ ^ 

where N is the noise covariance matrix. The prior probability distribution for $l given 9 can be expressed by a multivariate 
Gaussian by definition. Using the covariance matrix P$ of the potential, we derive 

P^ L \9)^^^=e- 1 / 2 ^ 1 ^. (7) 
For flat priors P(/nl), P{&) we finally obtain 



P(d, $lJnl,9) ocexp 



(#j))))tjv- 



x(d- 



M($l + /nl($l 



(*£»)) 



(8) 



as an exact expression for the joint distribution up to a normalization factor. 

To derive the posterior density, P(/nlM), one has to marginalize the joint distribution over $l and 9. As it is not 
possible to calculate the high dimensional $l integral directly, an effective sampling scheme must be found to evaluate the 
expression by means of a Monte Carlo algorithm. One possibility would be to let a Gibbs sampler explore the parameter 
space. Unfortunately, we were not able to find an efficient sampling recipe from the conditional densities for /nl and $l 
as the variables are highly correlated. An algorithm that also generates correlated samples, but is potentially suitable for 
non-Gaussian densities and high degrees of correlation is the Hamiltonian Monte Carlo approach. We will return to this 
approach in Sect. |6l 

For now we attempt to go beyond correlated samplers and see whether we can develop an approximate scheme, 
valid in the limit of weak non-Gaussianity, to sample /nl independently. We start out by expanding the target posterior 
distribution into an integral of conditional probabilities over the non-linear potential $nl, 



P(/nl|0= / d$ Nh d6P(f NIj \$KuO)P($ Nh \d,d)P(e\d) 



(9) 

To construct the conditional probability P($nlM, &) in the integrand, we need to find an equivalent equation for the 
joint distribution (Eq. [S]) as a function of the field $nl- However, a simple analytic expression for the prior distribution 
of $nl does not exist because it is a non-linear transform of the Gaussian auxiliary field $l- To quantify the expected 
correction, we calculate its covariance matrix, 



(P* NL )y = (($nlM$nl)j> 



- (P*J 



(*L)i> + fLm^U^)") - ((*L)?)((<fL), 2 )] 
y + S/nl^*!,)^ • 



(10) 



As the covariance matrix P$ L is of the order 0(10 ) and the non-Gaussian contribution to $nl is known to be small, 
we neglect the higher order correction in the prior distribution in what follows. That is, we approximate the true prior 
probability function by a Gaussian distribution in $nl and in this way derive a simple expression for the joint density, 
as a function of $nl, 



P(d,$ 



NL, 



oc exp 



(d - BM^ h yN-\d - BM$ NL ) + *J, L J£ x $ 



^NL 



(11) 
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Note, that the approximation applies to the second term only, the first part of the expression remains unaffected. As 
this approximation is equivalent to imposing the prior belief of purely Gaussian primordial perturbations, we expect to 
underestimate /nl in the low signal-to-noise regime, as we tend to replace the Wiener filtered noise with purely Gaussian 
signal. Contrary, the method is unbiased when the likelihood dominates over the prior which is unlikely for data derived 
by the Planck mission. 

The direct evaluation of the joint distributions over a grid in the high dimensional parameter space is computationally 
not fea sible. One option would be to app roximate the PDF around its maximum to get an expression for the attributed 
errors (|TegmarkHl997t feond et ahlll998tl . These methods are still computationally expensive and can also not recover 
the full posterior. An alternative approach to overcome these problems is to draw samples from the PDF which is to be 
evaluated as we will discuss in the next section. 



3.2. Conditional probabilities 

To construct the target posterior density Eq. we have to find expressions for the conditional probabilities P($nlK 9) 
and -P(/nl|3>nl, The former distribution can easily be derived from the joint probability density Eq. [IT] Since the 
exponent is quadratic in $nl in our approximation, the conditional PDF of $nl given d and 9 is Gaussian. Therefore, 
we can calculate mean and variance of the distribution via differentiating the expression, 



($NL> = <($NL - <$NL» 2 } M^B^N-H 

<($nl - <$nl» 2 ) = [M^B^N^BM + P^Y 1 ■ 



(12) 



As a next step, we derive the conditional probability distribution of /nl for given <I>nl- This expression is not affected 
by the approximation and can be derived from a marginalization over $l, 



(13) 



P(/nl|$NL,0) = / d$LP(/NL|$L,$NL)-P($L|0) 

= J d$ L <5($ NL - $ L - /nl(* l - 



Using Eq. we can calculate the integral and obtain 



P(/nl|$nl' 



1 



2/nl(*l) s 



where <J>l is a function of /nl and can be regarded as inversion of Eq. [21 



$L = 



1 



2/ 



NL 



-1 



l + 4/ NL ($ 



NL 



7nl($ l » 



(14) 



(15) 



Note that we can resolve the ambiguity in sign in the weakly non-Gaussian limit (|Babichll2005l ). Because the absolute 
value of the elements of the second solution is typically larger by orders of magnitude, the probability of its realization 
is strongly disfavored by the prior P($l)- The factor of suppression is typically less than 10~ 1000 and further vanishing 
with decreasing /nl- 

After setting up the conditional densities, we now can sample from the distributions iteratively. First, we draw $nl 
from a Gaussian distribution using Eqs.[TJJ Then, /nl can be sampled according to Eq. [lousing the value of $nl derived 
in the preceding step. Thus, the sampling scheme reads as 



<^ L ^P($ NL |d,0) 
/nl^(/nl|$ nl ,#)- 



(16) 



Note that this is not Gibbs sampling. For a fixed set of cosmological parameters, we can chain together samples 
from the conditional densities above, producing independent /nl samples. The efficiency of such a direct Monte Carlo 
sampler is therefore expected to be much higher than that of a Gibbs sampler, which, in the general case, would produce 
correlated samples. 

As an extension of the sampling scheme presented so far, we sketch an approach to account for uncertainties in 
cosmological parameters and foreground contributions. Complementing the scheme (Eqs. I16[) by an additional step 
allows to take into account the error in the parameters 9, 



^l^P^nlM,^- 1 ) 
/nl^/nlI^nl,^ 1 ) 



(17) 
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where the last equation updates the cosmological parameters that can be sampled from the data by means of standard 
Monte Carlo analysis toolsj. Now, the scheme formally reads as a Gibbs sampler and can in principle take into account the 
correlation among /nl and the other cosmological parameters exactly. In practice, however, the impact of a non- vanishing 
/nl is expected to be negligible, i.e. P(0\d, /nl) ~ P(0\d). Likewise, we can allow for an additional sampling step to deal 
with foreground contributions, e.g. from synchrotron, free-free, and dust emission. Foreground templates f sync, free, dust ^ 
that are available for these sources, can be subtr acted with amplitude s c sync > ^ ree ' dust which are sampled from the data 
in each iteration, c l P{c\d, f s y nc > f ree > dus4 , Q l ) (jWandelt et al.ll2004f ). Alternatively, component separation techniques 
could be used to ta ke foreground contaminants into account without the need to rely on a priori defined templates 
(Eri ksen et al.ll2006[) . The traditional approach to deal with point sources is to mask affected regions of the sky to exclude 
them from the analysis. Discrete object detection has been demonstrated to be possible within a Bayesian framework 
(|Hobson fc McLachlan|[200llCarvalho et al.| [2009). and can be fully included into the sampling chain. However, as sources 
are only successfully detected down to an experiment-specific flux limit, a residue- free removal of their contribution is in 
general not possible. 

As the angular resolution of sky maps produced by existing CMB experiments like WMAP is high and will further 
increase once data of the Planck satellite mission becomes available, computational feasibility of an analysis tool is an 
issue. The speed of our method in a full implementation is limited by harmonic transforms which scale as 0{N^[^) and 
are needed to calculate the primordial perturbations at numero us shells at distances from the cosmic h orizon to zero. 
Thus, it shows the same scaling relation as fast cubic estimators (|Komatsu et aLlfeOOSl ; lYadav et al.ll2007|) . albeit with a 
larger prefactor. 



4. Implementation and Discussion 

To verify our results and demonstrate the applicability of the method, we implemented a simple 1-D toy model. We 
considered a vector $l of random numbers generated from a heptadiagonal covariance matrix with elements 



/ 



.. 0.1 0.2 0.5 1.0 0.5 0.2 0.1 . . 

/ 



x 10 _1U . (18) 



Then, a data vector with weak non-Gaussianity according to Eq. [5] was produced and superimposed with Gaussian white 
noise. Constructed in this way, it is of the order 0(1O~ 5 ), thus the amplitude of the resulting signal s is comparable to 
CMB anisotropies. 

The data vector had a length of 10 6 pixels; for simplicity, we set the beam function B and the linear transformation 
matrix M to unity. This setup allows a brute force implementation of all equations at a sufficient computational speed. 
We define the signal-to-noise ratio (S/N) per pixel as the standard deviation of the input signal divided by the standard 
deviation of the additive noise. It was chosen in the range 0.5-10 to model the typical S/N per pixel of most CMB 
experiments. To reconstruct the signal, we draw 1000 samples according to the scheme in Eq. 1161 

Whereas the <!>nl can be generated directly from a simple Gaussian distribution with known mean and variance, the 
construction of the /nl is slightly more complex. For each 3>nLj we ran a Metropolis Hastings algorithm with symmetric 
Gaussian proposal density with a width comparable to that of the target density and started the chain at /nl = 0. 
We run the /nl chain to convergence. We ensured that after ten accepted steps the sampler has decorrelated from the 
starting point. Our t ests conducted wi t h sev eral chains run in parallel give 1 < R < 1.01, where R is the convergence 
statistic proposed bv lGelman fe Rubinl (|1992f ). We record the last element of the chain as the new /nl sample. 

Finally, we compared the obtained sets of values {'I'nl}' {/nl! to the initial data. An example is shown in Fig. [TJ 
where we illustrate the reconstruction of a given potential $nl for different signal-to-noise ratios per pixel. The 1 — a 
error bounds are calculated from the 16 % and 84 % quantile of the generated sample. Typical posterior densities for 
/nl as derived from the sample can be seen in Fig. [2J We considered the cases /nl = and /nl = 200 with S/N = 10 
per pixel and show the distributions generated from 1000 draws. The derived posterior densities possesses a mean value 
of /nl = 6 ± 40 and /nl = 201 ± 40, respectively. The width of the posterior is determined by both the shape of 
the conditional PDF of /nl for a given <!>nl and the shift of this distribution for different draws of $nl (Fig. [3]). The 
analysis of several data sets indicate that the approximation does not bias the posterior density if the data are decisive. 
We illustrate this issue in the left panel of Fig. 21 where we show the distribution of the mean values (/nl) of the 
posterior density constructed from 100 independent simulations. For an input value of /nl = 200 we derive a mean value 
(/nl) = 199.3±34.8 and conclude that our sampler is unbiased for these input parameters. For a high noise level, however, 
the $nl can always be sampled such that they are purely Gaussian fields and thus the resulting PDF for /nl is then 
shifted towards /nl = 0. This behavior is demonstrated in Fig. [5] where we compare the constructed posterior density 
for the cases S/N — 10 and S/N = 0.5 per pixel. If the noise level becomes high, the approximated prior distribution 
dominates and leads to both, a systematic displacement and an artificially reduced width of the posterior. Therefore, 
the sampler constructed here is conservative in a sense that it will tend to underpredict the value of /nl if the data are 
ambiguous. 



1 E.g. as described in lLewis fc Bridle! (120021 ) 
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Fig. 1. Examples of reconstructed potentials $nl- Left panel: The input parameters for the calculation were /nl = 200 
and S/N — 1. Right panel: Analysis of the same data set for a signal-to-noise ratio of S/N — 10. For clarity, we show only 
40 elements of the $NL-vector (thick solid line) and its reconstruction (thin solid line) as well as the 1 — a error bounds 
(dashed lines). As the difference between $nl and the linear potential $l is very small, $l can not be distinguished 
from $nl in this plot. In both cases 1000 samples were drawn. 



An example of the evolution of the drawn /nl samples with time can be seen in Fig. |6l where we in addition show 
the corresponding autocorrelation function as defined via 

g(AJV) = 1 ^ - • ffi AJV - M) ) (19) 

where N is the length of the generated /nl chain with mean /i and variance a 2 . The uncorrelated samples of /nl ensure 
an excellent mixing of the chain resulting in a fast convergence rate. 



5. Optimality 

In a frequentist analysis, parameter inference corresponds to finding an estimator that enables to compute the most 
probable value of the quantity of interest as well as a bound for the error. Ideally, the estimator is unbiased and optimal, 
i.e. it's expectation value coincides with the true value of the parameter and the error satisfies the Cramer- Rao bound. 
Contrary, in a Bayesian approach, one calculates the full probability distribution of the parameter directly. Strictly 
speaking, optimality is therefore an ill-defined term within the Bayesian framework. All we have to show is that the 
approximation adopted in Eq. [IT] does n °t affect the outcome of the calculation significantly. Note that the simplification 
corresponds to imposing the prior of a purely Gaussian data set. In the case of the CMB, this is a very reasonable 
assumption because up to now no detection of /nl has been reported. 

To investigate the effects of the approximation, we checked the dependence of the width of the posterior distribution 
on /nl by running a set of simulations with varying input values /nl = 0, 50, 100, 150, 200, 250. The estimated standard 
deviation er/ NL of the drawn /nl samples, each averaged over 10 simulation runs, are depicted in the right panel of Fig. [H 
Contrary to the KSW estimator that shows an increase of cr/ NL with /nl, we find no such indication of sub-optimal 
behavior in the relevant region of small non-Gaussianity. In particular, as the width of the distribution stays constant 
in the limit /nl — > where our approximated equations evolve into the exact expressions, we conclude that the adopted 
simplification does not affect the result significantly. 

This finding can also be interpreted from a different point of view: It is possible to define a frequentist estimator for 
/nl based on the mean of the posterior distribution. Our results indicate that such an estimator is unbiased in the high 
signal-to-noise regime. 

We apply an additional test in the next section where we compare our sampling algorithm to a slower but exact 
scheme. 



6. Hamiltonian Monte Carlo sampling 

In addition to the sampling technique presented above, we tested whether an exact Hamiltonian Monte Carlo (HMC) 
sampler is applicable to the problem. Within this approach one uses the methods developed in classical mechanics to 
describe the motion of particles in potentials. The quantity of interest is regard ed as the spatial coordinate of a particle 
and the potential well corresponds to the PDF to evaluate (jDuane et al.l ll987). To each variable (/nl, ■ • ■ , ^L,n)j 
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Fig. 2. Examples of a constructed posterior distribution for /nl- The input parameters used in this runs were N p i X = 10 6 , 
S/N — 10 and /nl = (left panel) or /nl = 200 (right panel). For each parameter combination 1000 samples were drawn. 
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Fig. 3. Build-up of the posterior distribution of /nl- We depict the conditional probability distributions ^(/nlI^nl, &) 
for several realizations of $nl (left panel) and the constructed posterior after 1000 drawn samples (right panel). The 
input parameters were chosen to be N p i X = 10 6 , /nl = 0, and S/N = 2. 



a mass and a momentum is assigned and the system is evolved deterministically from a starting point according to the 
Hamilton equations of motion. 

Th e applicability of H MC sampling techniq ues to cosmological parameter estimation has been demonstrated in lHaiianl 
(|2007|) . and the authors of iTavlor et al.l ()2008[ ) compared HMC with Gibbs sampling for CMB power spectrum analysis. 
To apply HMC sampling to /nl inference, we deduced the expression of the Hamiltonian 

H =V^-log[P(d,$ L ,/ NL ,0)], (20) 

where the potential is related to the PDF as defined in Eq. \8\ The Hamilton equations of motion, 

dxi dH 
dt dpi ' 

d Pl _ dH _ aiQg[P(rf,$L,/NL,0)] (21) 

dt dxi dxi 
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Fig. 4. Properties of the sampler. Left panel: Shown is the distribution of the derived mean values of /nl from 100 
simulations for a fiducial value of /nl = 200. Right panel: We display the estimated standard deviation o/ NL of the drawn 
/nl samples as a function of /nl- Each data point is averaged over 10 simulations. The input parameters used in this 
runs were N p i X = 10 6 and S/N = 10, in each simulation 1000 samples were drawn. 
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Fig. 5. Impact of the signal-to-noise ratio on the approximate sampling scheme. Left panel: Example of a constructed 
posterior distribution for S/N = 10. Middle panel: Analysis of the same data set, but for S/N = 0.5. At high noise level, 
the distribution becomes too narrow and systematically shifted towards /nl = 0. Right panel: For comparison, we show 
the analysis of the data set at S/N = 0.5 using exact Hamiltonian Monte Carlo sampling. As input parameters, we used 
/nl = 300 and N pix = 10 6 . For the approximate and exact analysis, 1000 and 15 000 samples were drawn, respectively. 
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Fig. 6. Example /nl chain. Left panel: We display the chain of 1000 generated /nl samples which built up the histogram 
plotted on the right hand side in Fig. [5] Right panel: The autocorrelation function of /nl confirms the uncorrelatedness 
of the samples. 
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Fig. 7. Performance of the Hamilton Monte Carlo sampler. Left panel: Analysis of the data set of Fig. fusing the HMC 
sampler. Here, 15 000 samples were draw. Right panel: The autocorrelation function of /nl- 



are integrated for each parameter [xi] Pi} = {/nl, P/ N l, using the leapfrog method with step size St, 



x(t) 



Mt + -) = Pi (t) + ^ 

St , St. 
x t (t + St) = Xi(t) + — Pl (t + — ) 
mi 2 

n , St. St dlog[P(d,^,f NIj ,9)} 
Pi (t + St) = pi(t + -) + — 



(22) 



x(t+St) 



The equations of motion for Xi are straightforward to compute, as they only depend on the momentum variable. To 
integrate the evolution equations for the pi , we derive 



01og[P(d $ L ,/ NL ,fl)] = 3 _ {$ 2 >)tM t B tjv-i(d - BM$l - h h BM{$l <*»») 

C/NL 



dlog[P(rf, $l,/nl,i 



M^AT- 1 ^ - BAf$ L ) - F $ ^ L + 2/ NL diag(M t B t ^ 1 d)$ L + , 



(23) 



where we have truncated the gradient in the latter equation at order 0(^). The final point of the trajectory is accepted 
with probability p = min(l, exp[—AH]), where AH is the difference in energy between the end- and starting point. This 
accept/reject step allows us to restore exactness as it eliminates the error introduced by approximating the gradient in 
Eq. [231 and by the numerical integration scheme. In general, only accurate integrations where AH is close to zero result 
in high acceptance rates. Furthermore, the efficiency of a HMC sampler is sensitive to the choice of the free parameters 
which corresponds to a mass. This i ssue is of particular importance if the quantities of interest possess variances 



varying by orders of magnitude. Following iTavlor et al.l (|2008l) . we chose the masses inversely proportional to the diagonal 
elements of the covariance matrix which we reconstructed out of the solution of the sampling scheme from Sect. [3] We 
initialized the algorithm by performing one draw of <&nl from the conditional PDF P($nlM, &) and setting /nl = 0. The 
outcome of repeated analyses of the data set presented in Fig. [3] is shown in Fig. [7] The consistency of the distributions 
confirms the equivalence of the two sampling techniques in the high signal-to-noise regime. However, convergence for the 
HMC is far slower, even for the idealized choice for Wj and a reasonable starting guess, as can be seen from the large 
width of the autocorrelation function (see right panel of Fig. [7]) . 

We conclude, therefore, that the direct sampling scheme presented in Sect.[3]is more efficient than HMC when applied 
to the detection of local non-Gaussianities in the high signal-to-noise regime. However, as shown in the rightmost panel 
of Fig. [SJ the exact analysis using a HMC algorithm remains applicable at high noise level. 



7. Extension to realistic data 

Applying the method to a realistic CMB data set requires recovering the primordial potential $l on shells at numerous 
distances n from the origin to the present time cosmic horizon. Thus, the product of the transfer matrix M with the 
potential transforms to 

M^^^Mi^in). (24) 
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Here, the matrix M projects a weighted combination of the $(r - j) at different radii to a resulting two dimensional signal 
map s. The resolution of the r-grid can be coarser where the transfer functions for radiation are close to zero and must 
be finer at the distances of recombination and reionization. Another modification concerns the covariance matrix P$ of 
the potential. Now it additionally describes the correlation of $ on distinct shells at different distances, 

&P^* "> E *( r P *(ri),*(r,)*fo) > (25) 

and can be calculated from the primordial power spectrum V(k) predicted by inflation 

f*(r<),#(r 3 ) t = k 2 dkV{k) jtlkrx) j l {kr 2 ) ■ (26) 

To tighten the constraints on $, polarization information can be included into the analysis as well simply by replacing 
the temperature by the polarization transfer function in the expression for M. We plan to study the application of our 
methods to realistic CMB data in a future publication. 

The computational speed of a complete implementation is limited by harmonic transforms that scale as 0(N^( 2 ). 



8. Summary 

In this paper, we developed two methods to infer the amplitude of the non-Gaussianity parameter /nl from a data set 
within a Bayesian approach. We focused on the so called local type of non-Gaussianity and derived an expression for the 
joint probability distribution of /nl and the primordial curvature perturbations, $. Despite the methods are of general 
validity, we tailored our discussion to the case example of CMB data analysis. 

We developed an exact Markov Chain sampler that generates correlated samples from the joint density using the 
Hamiltonian Monte Carlo approach. We implemented the HMC sampler and applied it to a toy model consisting of sim- 
ulated measurements of a 1-D sky. These simulations demonstrate that the recovered posterior distribution is consistent 
with the level of simulated non-Gaussianity. 

With two approximations that exploit the fact that the non-Gaussian contribution to the signal is next order in per- 
turbation theory, we find a far more computationally efficient Monte Carlo sampling algorithm that produces independent 
samples from the /nl posterior. The regime of applicability for this approximation is for data with high signal-to-noise 
and weak non-Gaussianity. 

By comparison to the exact HMC sampler, we show that our approximate algorithm reproduces the posterior location 
and shape in its regime of applicability. If non-zero /nl is not supported by the data the method is biased towards 
Gaussianity. The approximate posterior more strongly prefers zero /nl compared to non-zero values than the exact 
posterior, as expected given the nature of the approximations which Gaussianize the prior. This method is therefore only 
applicable if the data contains sufficient support for the presence of non-Gaussianity essentially overruling the preference 
for Gaussianity in our approximate prior. 

Our efficient method enables us to perform a Monte Carlo study of the behavior of the posterior density for our toy 
model data with high signal-to-noise per pixel. We found that the width of the posterior distribution does not change as 
a function of the level of non-Gaussi anity in the data, contrary to the frequenti st estimator where there is an additional, 
/nl dependent, variance component (|Cremineili et al.ll2007t ILiguori et a l. 2007). Our results suggest that this may be an 
advantage of the Bayesian approach compared to the frequentist approach, motivating further study of the application 
of Bayesian statistics to the search for primordial local non-Gaussianity in current and future CMB data. 

We close on a somewhat philosophical remark. Even though we chose a Gaussian prior approximation for expediency, 
it may actually be an accurate model of prior belief for many cosmologists since canonical theoretical models predict 
Gaussian perturbations. From that perspective our fast, approximate method may offer some (philosophically interesting) 
insight into the question "what level of signal-to-noise in the data is required to convince someone of the presence of 
non-Gaussianity whose prior belief is that the primordial perturbations are Gaussian?" 
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